WB Homework 3

Agata Kaczmarek

Sources of model and knowledge:

At the beginning, I had an idea to train the model on GPU, but at the end of the day - trained on CPU (what required a looot of patience :)).

Dataset is about various sports. Downloaded datasets are divided into three folders: train, test and valid. Each of them consists of 100 folders with names of classes (the same for all three folders), where are stored all images. Whole set consists of over 13000 pictures. I tried training a net on all of them, but after about 11h of training, I gave up and interrupted, so the net is trained on about 6000 images, split into 100 classes.

ImageFolder is a "magical" data loader, which is very useful, while loading data from various folders.

A bit of processing of classes names - each name of a class is a name of a folder in which photos are stored. As we can see, quite a lot of various disciplines, from swimming to hockey.

Below function to evaluate the model, used while training to see, how is it going, if the model is not overfitting. Additional, parts from inside of that function were used later, to see the accuracy on batches from train, test and validation set, to proof, that the model is not overfitted.

Model used here is a pretrained ResNet50, with CrossEntropyLoss and Adam as an optimizer. On alredy linked website, it was proposed to do 4 epochs, but it took too long for my computer, so I decided to interrupt before the end of first epoch, however, even though, the results seems to be good. Especially, as we have so many classes.

In case anything goes wrong while doing this homework (and to later be able to open it fast), I wrote to file both trained model and informations from first batch from testloader - images (inputs), labels and predicted labels. It also helped me to make the pictures, which I explain, everytime the same, so I was able to draw some conclusions.

Checking scores on validation, train and test set. Model seems not to be overfitted.

Additionaly, the size of photos is 224 x 224, and the total number of images in train set is over 13000 (however, as stated before, not all were used).

Pictures and both true and predicted labels for earlier saved batch from test set. Images seems to be in good resolution, in most cases, we can easily say which sport is on the picture. However, due to the normalization at the beginning, some of the photos seems to be a bit too dark. In this batch, we have 12 correct predicted and 4 false labels.

In some of the cases, I think I would also have problems with recognizing, what is happening on the pictures, for example on a picture in bottom row, between horses and gimnastics girl. Additionaly, for the second one (rollerblade racing), I was surprised, that the model did it good, as the rolls are hardly visible. What is interesting, model made a mistake with the picture with snowboard - for me it is clearly visible, so later I will take a look at the explanation of a model - maybe it will help me understand, why it decided to predict 'ice climbing' instead of 'snowboarding'.

Visualization of 6 choosen photos, each explained with 3 methods

Below I present 6 good classified photos, with their explanation. Each of them was explained with the same three methods. I might have used function to do the calculations and printing, as they were in all cases similar, but I decided not to, as in this way, results of each part of code are shown directly below it, so this will help in coming back to this code while doing a project. Methods used are: LIME, IntegratedGradients and SHAP.

First good

For me very dark picture, however still visible. I was curious, what made model think it is sport called 'rings'. I thought it were this round rings, but let's see...

According to the LIME explanation, it seems as this prediction was some kind of random guess, as the explanation shows something in the surrounding of part of a feet of a men, for sure not rings.

For IntegratedGradients still not so much information (but much more than for LIME). There are no 'special places', which, as a whole part, made model to decide. What is interesting, the rings can be seen on the picture, but as a neutral surface, the one that has neither positive nor negative influence on the prediction.

Only SHAP seems to be explaining the images, as we would. It seems to say something about rings (green visible parts of both rings) and position of a body (horizontal, not vertical).

Second good

This picture seems to be characteristic, as it clearly shows people on wheelschairs and doing some sports, so I was thinking, that wheels will be the most important part for model to make the decision.

As previously assumed, even LIME showed, that wheel, the one in the main part of the picture, the biggest, had the most positive influence, to classify this image as 'wheelchair racing'.

For IntegratedGradients method, we can also see shape of a wheel, even being green, as having possitive influence, however, the picture is chaotic.

In this case also in SHAP method we can clearly see even more than one wheel, all coloured in green, so having positive impact. I think it is one of the best results of explanations done for this model in this file.

Third good

People doing 'rollerblade racing'. I was surprised, that the model was right for this picture, as the rollers are hardly visible. I hoped, that the explanation will show the rollers...

For LIME method, it did not even work, as if model was not even trying to classify this picture to a class, based on some kind of observation on an image, but just guessed or so.

For this method, we can hardly see anything, when we know, what is on that picture, we could think of straight lines, being legs of mens, but that is it.

Even explanation of a SHAP for this method, does not seems to make much of a sense. We can see some parts of people, for sure nothing, which seems to be rollers. For me it was a bit disappointing example.

Fourth good

This one seems to on one hand, have a potential of being good explained, as the javelin is clearly visible, however, on the other hand - javelin is small on the picture.

Also in this case, LIME had problems with saying at least anything...

Still messy IntegratedGradients, however, javeling clearly seen as a neutral surface.

Finally, in SHAP method, some interesting results can be seen. From colors of this explanation we can easily assume, that on this picture is a human with javelin. Additionaly, the colors are in such possitions, that we can say, that it makes logical sense.

Fifth good

Three hockey players and someoune, probably referee, between them. Referee makes the picture less readable, as he is wearing dark clothes. Hopefully, even though the model can explain itself, why the prediction is good.

Surprisingly, we can see an interesting pattern on LIME explanation, as there seems to be one of the players visible.

In this method, still essy, only we can say thet the left bottom corner had more impact on the decidion of a model - of course according to IntegratedGradients explanations.

Finally SHAP, logically the explanation makes sense, there are two players in green and even one of the hockey sticks. The hockeysticks is the most green thing there, so had biggest good impact, that makes sense.

Sixth good

Being hones, I di not know about this sport before. The picture seems to be readable, especially 'pipe' and helmet. Let's what the explainers said.

LIME decided to highlight the whole person, which in some way makes sense, because the person seems to be in a not usual position.

This time, we can see some shapes, the person and part of a pipe, however, I can say that only because I know the picture. But this time we can see, which regions of a picture were more influencing the decidion than others.

And SHAP explanation, without big surprises, performed best here, showing what a human would say about most important parts of a picture.

Wrong classified picture

Now I would like to look closer on a picture with snowboard, which for human seems to be easy to recognize. On contary, the model had problems with predicting good label.

Also this time LIME had problems giving a proof for the decision of a model. Even though the segmentation looks really cool.

Also this time IntegratedGradien gave some messy output, only slightly showing, which regions were having more impact on decision. These regions seems to be similar for both true label and predicted (wrong) label.

In this case, the result surprised me the most. Model explains, that the most good part for true label is a board, what makes a lot of sense. However, the amout of parts of a picture, which makes model think, it is not snowboard is too big. In this case, model says, that the predicion is 'ice climbing' which is not correct. However, on the explanation for this idea, model gives quite good resons, that means, part of a hill with snow, in right bottom corner.

Summary

It was really interesting to see, which parts of a picture made model make decision about the prediction. A bit disappointing was, that not for all images, LIME method was able to give any ideas. Also, IntegratedGradients was nearly in all cases too messy, to think of any shapes, that might be on a picture, shapes which made model think of a solution. The best method, I like it the most, was SHAP. Nearly in all cases, the explanation were logically acceptable for me, as things that I would also say, when ased, why I think that there is snowboarding person or climbing one. The most surprising was the last one, wrong classified, even though board was correctly found. I hope, that this situation took only place bacause the model had only 75% accuracy on this test batch. I wish training of this model took less (nearly 11h on my computer to calculate only part of a epoch), so I would be able to wait longer and see the explanations of a model ahving over 90% of accuracy (in the source mentioned before, they made it so good).